Can Text Simplification Help Machine Translation?

نویسندگان

  • Sanja Stajner
  • Maja Popovic
چکیده

This article explores the use of text simplification as a pre-processing step for statistical machine translation of grammatically complex under-resourced languages. Our experiments on English-to-Serbian translation show that this approach can improve grammaticality (fluency) of the translation output and reduce technical post-editing effort (number of post-edit operations). Furthermore, the use of more aggressive text simplification methods (which do not only simplify the given sentence but also discard irrelevant information thus producing syntactically very simple sentences) also improves meaning preservation (adequacy) of the translation output.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Survey of Automated Text Simplification

Text simplification modifies syntax and lexicon to improve the understandability of language for an end user. This survey identifies and classifies simplification research within the period 1998-2013. Simplification can be used for many applications, including: Second language learners, preprocessing in pipelines and assistive technology. There are many approaches to the simplification task, in...

متن کامل

Sentence Alignment Methods for Improving Text Simplification Systems

We provide several methods for sentencealignment of texts with different complexity levels. Using the best of them, we sentence-align the Newsela corpora, thus providing large training materials for automatic text simplification (ATS) systems. We show that using this dataset, even the standard phrase-based statistical machine translation models for ATS can outperform the state-of-the-art ATS sy...

متن کامل

Optimizing Statistical Machine Translation for Text Simplification

Most recent sentence simplification systems use basic machine translation models to learn lexical and syntactic paraphrases from a manually simplified parallel corpus. These methods are limited by the quality and quantity of manually simplified corpora, which are expensive to build. In this paper, we conduct an indepth adaptation of statistical machine translation to perform text simplification...

متن کامل

Simplifying the Bible and Wikipedia Using Statistical Machine Translation

I started this work with the hope of generating a text synthesizer (like a musical synthesizer) that can imitate certain linguistic styles. Most of the report focuses on text simplification using statistical machine translation (SMT) techniques. I applied MOSES to a parallel corpus of the Bible (King James Version and Easy-to-Read Version) and that of Wikipedia articles (normal and simplified)....

متن کامل

Building a Monolingual Parallel Corpus for Text Simplification Using Sentence Similarity Based on Alignment between Word Embeddings

Methods for text simplification using the framework of statistical machine translation have been extensively studied in recent years. However, building the monolingual parallel corpus necessary for training the model requires costly human annotation. Monolingual parallel corpora for text simplification have therefore been built only for a limited number of languages, such as English and Portugu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016